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Abstract — We investigate adaptive single-trial error/erasure 
decoding of binary codes whose decoder is able to correct e 
errors and r erasures if Ae + r < d m i n 1. Thereby, d m i n is 
the minimum Hamming distance of the code and 1 < A < 2 
is the tradeoff parameter between errors and erasures. The 
error/erasure decoder allows to exploit soft information by 
treating a set of most unreliable received symbols as erasures. The 
obvious question here is, how this erasing should be performed, 
i.e. how the unreliable symbols which must be erased to obtain 
the smallest possible residual codeword error probability are 
determined. This was answered before |1| for the case of fixed 
erasing, where only the channel state and not the individual 
symbol reliabilities are taken into consideration. In this paper, 
we address the adaptive case, where the optimal erasing strategy 
is determined for every given received vector. 

I. Introduction 

The idea of exploiting soft information from the trans- 
mission channel using hard-decision algebraic error/erasure 
decoders dates back to Forney El. ||3l, His Generalized Min- 
imum Distance (GMD) decoding scheme applies a Bounded 
Minimum Distance (BMD) error/erasure decoder repeatedly, 
each time with a different number of erased most unreliable 
received symbols. Forney proved that the residual codeword 
error probability of GMD decoding approaches that of Max- 
imum Likelihood (ML) decoding if the channel is good and 
the number of decoding trials is where d m \ n is the 

minimum Hamming distance of the code. This explains why 
GMD decoding is frequently applied for concatenated coding 
schemes. There, the inner code is responsible for correcting a 
considerable amount of transmission channel errors. Thus, the 
input symbols for the outer decoder can be viewed as being 
transmitted over a super channel, which is composed of the 
transmission channel and the inner decoder. This super channel 
is always good if the parameters of the inner code are chosen 
appropriately. 

The fundamental task of GMD decoding with given number 
of decoding trials is to find an erasing strategy which either 
maximizes the guaranteed decoding radius or minimizes the 
residual codeword error probability. Both measures can be 
optimized either in a fixed manner or adaptively. For fixed 
erasing, the erasing strategy depends only on the state of the 
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transmission channel and remains unchanged for each received 
vector. The fixed approach essentially optimizes the overall 
worst-case measure. Adaptive erasing on the other hand takes 
every single received vector into consideration, choosing the 
optimal erasing strategy for exactly this specific received 
vector. Obviously, one can expect the adaptive approach to 
yield better results than the fixed approach, especially for 
mediocre channel conditions. 

Different settings for optimal fixed erasing have been con- 
sidered in I3)-||6] (radius maximization), (TJ, Q (error proba- 
bility minimization), and [8|, [9| (both). Results about adaptive 
erasing can be found in |6|, I lOI- lfOl (radius maximization). 

In the present paper, we tackle the previously unconsidered 
problem of adaptive erasing with the target of minimizing the 
residual codeword error probability. In doing so, we restrict 
ourselves to one single decoding trial. This restriction allows 
to focus on the core of the problem and will be relaxed 
in future work. Furthermore, we assume binary antipodal 
signaling and a memoryless channel with soft output. The 
Additive White Gaussian Noise (AWGN) channel will serve 
as our main example for such channels. 

The paper is organized as follows. In Section [TTJ we briefly 
describe error/erasure decoding and introduce some required 
notations. In SectionUm we derive an adaptive erasing strategy 
which minimizes the residual codeword error probability. In 
doing so, we apply basic techniques from probability theory 
like discrete random variables and probability generating func- 
tions. A computationally more efficient version of the erasing 
strategy is given in Section [IV] Simulation results are given 
in Section [V] conclusions and an outlook to further research 
in Section [VI] 

II. Error/Erasure Decoding 

We consider a binary code C(2; n, k, <i m i n ) with length n, 
dimension k and minimum Hamming distance d m ; n . For C, we 
have a \-extended Bounded Distance error/erasure decoder or 
simply \-decoder dece(-) which is able to correct e errors 
and r erasures if A e + t < d m i n — 1. Here, 1 < A < 2 is the 
tradeoff parameter between errors and erasures. For A = 2, the 
decoder is a traditional BMD error/erasure decoder. For Bose- 
Chaudhuri-Hocquenghem (BCH) codes, such error/erasure 
decoders are described e.g. in fl4l . fl5l . 



At the transmitter, an information vector a £ F* is encoded 
into a codeword c £ C C FJ. The binary symbols Cj, 
i = 0, . . . , n — 1, are then mapped to binary antipodal signals 
Xi := — l Ci £ { — 1, +1}, which are transmitted over the chan- 
nel. Each transmitted symbol x\ is distorted by the channel 
to a received symbol yi £ R. The A-decoder can only handle 
hard input, hence the real received symbols must be mapped 
to symbols of the binary field Fa. This can be accomplished 
by the Heaviside-like function 

( R — > R 
a :=\ f -1, ify<0 , 

I " 1+1, if?/>0 

which essentially extracts the sign of a real received symbol, 
and the inverse mapping function 

f {-1,+1} — ► F 2 
/3:= f 1, ify=-l , 

{ ff I 0, ify = +l 

which maps real symbols to symbols of F 2 . The binary vector 

r:=(/3Hy )),---,/%(yn-i))) (D 

is a distorted version of the transmitted codeword c and could 
be fed into the A-decoder for traditional errors-only decoding. 
Decoding would be successful if for the number e := dn(c, r) 
of errors in r holds A e < d m in— 1 or, in more familiar notation, 
e < [ dmi j v 1 ~ 1 j . Here, dn(-, ■) is the Hamming distance between 
two vectors of equal length. 

Let Pcr(- ]•) be the transition probability of the memoryless 
channel, the parameter a marks the channel state. Then, using 
Bayes' Theorem, we can calculate for each received symbol 
y the probability h a (y) that —a(y) was transmitted, i.e. a 
transmission error occurred. 

h a {y) :=P a (-a(y)\y) 

= P ff (y|-q(y))Pr(-a(y)) 
Pr(y) 

= P g (y|-q(y))Pr(-a(y)) 

P a (y \a(y)) Pr(ot(y)) + P CT (y | - a(y)) Pr(-a(y)) 
= P a (y\-a(y)) 

? a {yWy)) + ?<?{y\-u(y)Y 

where the last equality follows from the reasonable assumption 
Pr(— a(y)) = Pr(a(y)) = \ of equiprobable transmitted 
symbols. It is justified to denote h a (y) as unreliability value 
of the received symbol y. The greater h a (y), the higher the 
probability that y is an erroneous symbol. W.l.o.g. let us from 
now on assume that the symbols of the received vector y (and 
by ([TJ also r) are ordered according to their unreliability value, 
i.e. h a (yo) > ■■> h a {yn-x)- 

We obtain a new received vector by erasing the r most 
unreliable symbols in r. This new vector is denoted by 

r T := ( X ,X , r T , . . . , r n -i)- 

r times 

The A-decoder is capable of decoding r T as long as 
A e + t < drain ~ 1> where e is the number of errors in the 



non-erased symbols r T , . . . , r„_i. The number of erasures is 
the decoder's degree of freedom, so the task of an adaptive 
error/erasure decoder is as follows. 

Problem 1 For given received vector y = (jjq, . . . , y n —i) 
with ordered unreliabilities /i CT (yo) ^ "" ^ h a [y n -i) and 
channel state a find the optimal number < r * < d min — 1 of 
erased most unreliable symbols such that the residual code- 
word error probability of decoding r r * with the X-decoder 
decc(-) is minimized. 

In the following section we provide an exact solution to 
Problem [TJ which is computationally expensive. In Section [IV] 
we give a very good approximated solution which is compu- 
tationally efficient. 

III. Derivation of an Adaptive Erasing Strategy 

To solve Problem [TJ it is required to express the residual 
codeword error probability after adaptive error/erasure decod- 
ing as a function of the number r of erased symbols. We 
accomplish this using basic techniques from probability theory. 

Let the discrete random variables Xi, i = 0, . . . , n — 1 be 
defined by 

_ ( 1, if t/i is erroneous (y { ^ x { ) 
1 '~ 1 0, if yi is correct = xt) 

The probabilities of the two possible values of Xi are 
determined by the unreliability value of symbol j/j, i.e. 
Pr(A^ = 1) = h a ( yi ) and Pr(X t = 0) = 1 - K{ Vl ). 

Since Xi takes on only nonnegative integer values, its 
probability generating function (PGFj 0161 . 0171 is given by 

G^p) :=E{p x *} (2) 
= Pr(J5Q=0)+pPrpQ = l) 
= 1 - K(yi) + ph a (yi). 

Assume that the r most unreliable symbols of r are 
erased and r T is fed into the A-decoder. Then, there are 
e, < e < n — r, erroneous symbols among the non-erased 
Ti — t symbols. We can model their number with a new random 
variable Y T using the random variables Xi, i = r, . . . , n — 1. 

n-1 
i—T 

We obtain 

G a ,Y T (p) :=E{^} 

= E{p x T +-+x n . 1} 

= E{p x - P x '^} (3) 

= E{/^} E{p x ^} 

n-1 

= l[G^ Xz (p) (4) 

i—T 

for the PGF of Y T , i.e. the PGF of Y T is the product of the 
PGFs of the X T , . . . , X n -\ and thereby known. Note that the 
expectation of the product in (01 can be written as a product 



of expectations since the channel is memoryless and thus the 
Xi are independent. The product © results directly from the 
definition (fJJ of the G a .Xf 

Using the PGF of Y T we can calculate the probability that 
there are s, < e < n — t, errors in r T by 



Pr(F T = e) 



(5) 



where the superscript ^ denotes the e-th derivative. 

Recall that the A-decoder is capable of decoding e errors 
and r erasures if Xe + r < G? m in — 1. In case of r, 
< r < d m i u — 1, erasures the decoder will fail if the number 
of errors in the non-erased symbols is greater than dmi "^ 1 ~ r . 
Using ||5), the probability of this event is determined by 



Pr ( W > dmin / — ) = 1 - J2 Pr ( Y - = £ ) 

e=0 

=: P.(r). 



A 



(6) 



P CT (r) is the residual codeword error probability as a function 
of the channel state a and the number r of erased symbols. 
Hence, the optimal choice of r is 



arg min {P cr (r)} 

0<T<d min -l 



arg max 

0<T<ci min 



e=0 



(7) 



(8) 



The residual codeword error probability is minimized by 
erasing the t* most unreliable symbols since from (|7]) we 
obtain 



man 

0<T<d min 



{P CT (r)}, 



which proves that adaptive erasing with t* as in (|7]) is at least 
as good as errors-only decoding with r = and single-trial 
fixed erasing with some Tg xod , < r^ xcd < d m i n — 1 in terms 
of the achievable residual codeword error probability. 

Using the results from this section we can state Algorithm[T] 
for optimal adaptive error/erasure decoding. It provides an 
exact solution for Problem Q] 

The drawback of Algorithm [T] is its computational com- 
plexity. Sorting a vector of length n in line 2 has complex- 
ity 0(n 2 ) and can be accomplished in place e.g. by the 
bubble sort algorithm (TBI. Calculating the PGFs G CT) y T (p), 
t = 0, . . . , e? m ; n — 1, in lines 4-5 essentially means multi- 
plying n polynomials G a: Xi{p), each with degree 1. This 
can be done efficiently using n Fast Fourier Transforms 
(FFT) of length n and componentwise multiplication of the 
frequency domain coefficients. Since the input polynomials for 
the FFT have degree 1 (e.g. only two non-zero coefficients), 
2-pruned FFTs lfl9ll with complexity 0(n) can be used. The 
n 2-pruned FFTs together have complexity 0(n 2 ) and the 
number of componentwise multiplications is n 2 . The required 
single inverse FFT of length n has complexity C(nlog(n)). 



Algorithm 1: Optimal Adaptive Error/Erasure Decoding 
input : C(2; n, fc, d m i n ), y <E K", er, A-decoder decc(-) 

1 calculate h a (y ),..., h a (y n -i) 

2 sort y s.t. h a (y Q ) >••■> K{y n -x) 

3 r <- (j3(a(y )), /3(a(y n _i))) 

4 for t = 0, . . . , d m in - 1 do 

5 |^ calculate G a: Y T {p) 

6 m <— 1 

7 for t 4 — to d m \ 



II 0(n 2 ) 
II 0(n 2 ) 



10 

n 

12 



•min - 1 dO 

for e = 0, ... , d —- 1 - T do 



A 

]_ calculate G a ,Y T {p) 
if Po-(t) < m then 

m <- P CT (r) 



00 



p=0 



// 0(n 2 d min ) 
II 0{n 2 ) 

II 0{nd mm ) 



13 calculate r r * from r 

14 revoke sorting of r T * 

is return decc ( r r*) II 0{n 2 ) 

output: codeword estimate ceCor erasure X 



Hence, the complexity of lines 4-5 is 0(n 2 ). The loop in 
lines 8-9 requires the evaluation of [ dmi "~ 1 ~ r j + 1 deriva- 
tives at p = 0. This can be accomplished with complexity 

° ((L rf """A 1 ~ T J + l ) n ) ^ °( n2 ) usin § an algorithm from 
Pankiewiczs ll20l which is based on Horner's Scheme. The 
resulting values are required for the calculation of the Pcr(r) 
in line 10 as in (|6). For each P,j(t), a sum over [ dmin ^ 1 ~' r j +1 
probabilities Pr(y r = e) has to be calculated. Using the pre- 
computed values from lines 8-9, this can be accomplished with 
complexity 0(nd m i n ). Since the loop in lines 7-12 is executed 
d m i n times, its complexity is 0(n 2 d m in)- The complexity for 
A-decoding in line 15 is 0(n 2 ). Altogether the computational 
complexity of Algorithm [T] is C(n 2 d m i n ) C 0(n 3 ). 

Section |IV] addresses a computationally more efficient ver- 
sion of the algorithm which uses very good approximations of 
theP CT (r). 

Example 1 We consider the BCH code C(2; 127, 36, 31) with 
a traditional BMD error/erasure decoder, i.e. A — 2. The 
symbols { — 1,+1} are transmitted over an AWGN channel. 
In this case, the unreliability of received symbol y is 

h a {y) = ha, AWGN iv) ■= 

1 + exp 

Throughout the paper exp(-) and log(-) 
assume SNR = dB, and obtain a = 



( 2ya(y) \ 



have base e. We 



10" 



= V0.5. 



Figure Q] depicts the operation of the loop in lines 7-12 of 
Algorithm Q] For each r = 0, . . . , 30 and e = 0, . . . , ^f±- 
the probabilities Pr(y r = e) are calculated. Each Pr(Y T = e) 
is represented by one point in Figure Q] This allows to 
calculate the sums in the maximization term of ©. Each 
of the sums is the sum over one slice of the point surface 
in Figure Q] in e-direction. The optimal choice of r is 



the slice whose sum is maximal, in case of the considered 
codeword/transmission/received vector the optimization yields 



/0.5 



4. 




Fig. 1. Point surface consisting of the probabilities Pr(Y T = e), where 
T = 0, . . . ,30 and e = 0, . . . , 2°=£. 



IV. Computationally Efficient Adaptive Erasing 

In this section, we present a technique which allows to 
reduce the computational complexity of Algorithm [TJ from 
cubic in n to 0(n 2 y/n). It utilizes an approximation of the 
probabilities Po-(t), t = 0, . . . , <i m i n — 1- This approximation 
is based on the following result by Hoeffding 1211 . 

Theorem 1 (Hoeffding Bound) Let Aq,..., A rn ^i be m in- 
dependent random variables with finite first and second mo- 
ments, which are almost surely bounded, i.e. 

Pr(A; - E{Ai} € [a h h]) = 1, i = 0, . . . ,m - 1, 

where E{-} denotes the expectation of a random variable. 
Then, for the sum S = Aq + • ■ ■ + A m -\ and t > holds 

( 2m 2 t 2 \ 
Pv(\S - E{S}\ > mt) < 2exp -—^77 £ . 

We apply Theorem[TJto Y T = 5TJ^T T X{, i.e. m = n— 1— r. 
By definition, we have X.- L £ {0, 1} and thus 



5> 

j=0 



a-i) 



m = n — 1 — r. 



We obtain 



Pr(\Y T ~E{Y T }\ >t( n -l-r)) 

< 2exp (-2i 2 (n- 1 - t)) . 

This means that the sum of the probabilities 

Pr(Y T = 0), . . . , Pr(F r = E{Y T } - t), 

Pr(y r = E{Y T } + t),..., Pr(Y T = d min - 1) (9) 



is exponentially decreasing with t. We can conclude that the 
sum in (O is dominated by only a small set of probabilities in 
proximity to the expectation E{Y T }. Let us set t := n _ B 1 _ T . 
We obtain 

Pr(|y T - E{Y T }\ > s) < 2exp (— 



< 2 exp 



\Jn — 1 — r , 

i.e. the contribution of the probabilities from (O in © is less 
than 2 exp ^— y=^j . This fact can also be observed in Figure[TJ 
The probabilities Pr(y r = e) diminish quickly around the 
expectation of each slice in e-direction. To obtain a good 
approximation (with precision goal 1CP 2 ), let us select s such 
that 

2s 2 " 



2 exp 



< 1(T 



s > 



log(0.5 • 10~ 2 ) 



We define 



so 



N\{0} 



n 



Iog(0.5-1Q- 2 ) 



Figure [2] shows the value of So(n) for a practical range of 
code lengths n. 



s0(x) X 



Fig. 2. Value of so(n), n = 1, . . . , 2048, for precision goal 10 2 . 

Eventually, the Hoeffding bound justifies to neglect 

Pr(F T = 0), . . . , Pr(F T = E{Y T } - s (n)), 

Pr(y T - E{Y T } + fl0 (n)), . . • , Pr(F r = d min - 1) 

in the sum of ©. As a result, we obtain very good approxi- 
mations for Pcr(r) if we calculate the sum in (O over at most 
2so(n) elements, i.e. 

[mm{E{Y^}+ So {n), }j 

P,(r)«P f (r):=l- ^ Pr(Y T = e). 

e=max{ \E{Y r }] -s a (n), 0} 



The required expectation can be easily calculated using the 
PGF © of Y T , i.e. 



E{Y T } := G(t.y t (p) 



(i) 



P =i 



(10) 



where the superscript W denotes the first derivative. 

We use the previous results to state Algorithm [2] which 
solves Problem QJ with high precision and better computational 
complexity than Algorithm [TJ 

Algorithm 2: Efficient Adaptive Error/Erasure Decoding 
input : C(2; n, k, d min ), y € K n , a, s (n), 
A-decoder decc(-) 

1 calculate ft, ff (yo), ■ • • , K{y n -i) 

2 sort y s.t. /i CT (y ) > • • > h a (y n -i) 

3 r <- (j3(a(y )), P{a{y n -x))) 

4 for r = 0, . . . , d m in — 1 do 
s calculate Go-.y x (p) 

6 m ■<— 1 

7 for t «— to d min — 1 do 
calculate £{1^} 
I 4r- max{[-E{F r }] - s o (n),0} 
u <- Lmin{£{F T } + fl0 (n), rfm "\ 1 ' T }j 
for e = /,..., it do 

|_ calculate G (T! y T | p=0 

if P<t(t) < m then 
m <- P CT (r) 



// 0(n 2 ) 
// 0(n 2 ) 

// 0(n^d min ) 



// O(n^n) 
// 0(n^n) 



16 calculate r T * from r 

17 revoke sorting of r T * 

is return decc (r T *) // 0(n 2 ) 

output: codeword estimate c s C or erasure X 

Algorithm |2] has some differences compared to Algorithm[TJ 
we will now briefly analyze their computational complexity. 

Lines 1-6 remain unchanged, sorting, mapping to symbols 
of F2 and pre-calculation of the PGFs is the same for 
both the exact the the approximative algorithms. The loop 
in lines 7-15 starts with the calculation of the expectation 
E{Y T } according to ( TTOb . This can be accomplished with 
linear complexity. In lines 9-10, lower and upper bounds 
for the loop in lines 11-12 are calculated, using essentially 
E{Y T } and the input parameter so(n). Since So(n) grows with 
tfn, the loop in lines 11-12 calculates the value of O(tfn) 
subsequent derivatives of the PGF G a ,Y T {p). The complexity 
of this calculation is 0{ntfn) using Pankiewiczs' algorithm 
|20|. The calculation of P ' „(t) in line 13 involves summation 
of probabilities Pr(F T = e). Using the pre-computed 
values of the derivatives from lines 11-12, each Vy(Y t = e) 
can be calculated with complexity linear in n, hence P ct (t) can 
be calculated with complexity 0{ntfn). Note that calculating 
Po-(t) in Algorithm [TJ is in 0(nd min ). Alltogether, the loop in 
lines 7-15 is in 0(ntfnd m i n ) and thus the overall complexity 
of Algorithm |2] is 0{n 2 tfh). 



V. Simulation Results 

After the derivations of two adaptive error/erasure decoding 
algorithms in Sections [HI] and [TV] we devote this section to the 
analysis of their performance and behavior. First, we consider 
the short BCH code C(2; 31, 16, 7), a traditional BMD decoder 
with A — 2 and an AWGN channel in the range between dB 
and 6 dB. 

Figure [3] shows the simulation results. The black curve (di- 
amonds) denotes traditional errors-only decoding. The green 
curve (squares) shows the result of Algorithm [TJ It is not dis- 
tinguishable from the red curve (circles) showing the result of 
the computationally more efficient Algorithm[2] For reference, 
the figure also contains the result of error/erasure decoding 
with fixed erasing (blue curve, triangles) as in (TJ. The 
aforementioned result assumes very good channel conditions, 
hence its performance is bad in the considered range. However, 
there is a crossing point with the errors-only curve and we 
showed that the gain of optimal fixed erasing is 1.5 dB for an 
infinitively good channel. Note that the simulation confirms 
our observation from Section [Till that Algorithm [TJ must be 
as least as good as errors-only decoding and error/erasure 
decoding with optimal fixed erasing. 
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Fig. 3. Simulation results for C(2; 31, 16, 7). 

For the second simulation, we reconsider the setting of 
Example [TJ i.e. the BCH code C(2; 127, 36, 31). We observe 
that Algorithm [2] enables a reduction of the residual codeword 
error probability starting at around SNR = 1 dB. 

VI. Conclusions 

Despite the seminal results of Kotter and Vardy about 
algebraic soft-decision decoding |22| using the Guruswami- 
Sudan algorithm 11231 . pseudo-soft decoding with traditional 
algebraic error/erasure decoders is still of practical interest. 
Such decoders are widely deployed and efficient implemen- 
tations are available. Single- and multi-trial error/erasure 
decoding builds up on these decoders, i.e. they are provided 
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Fig. 4. Simulation results for C(2; 127, 36, 31). 

with modified received vectors in which one or multiple sets 
of most unreliable symbols are erased. 

In this paper, we provided two algorithms for adaptive 
single-trial error/erasure decoding for binary codes. The eras- 
ing strategy of the first algorithm is guaranteed to be optimal. 
The prize for this optimality is computational complexity 
0(n 3 ). The second algorithm gives an approximative optimal 
solution with precision 10~ 2 . This allows to reduce complexity 
to 0(n 2 ^/n). Our simulations show that the performance 
results of both algorithms are virtually indistinguishable in 
practical settings. However, the approximative algorithm can 
easily be adapted to fulfill higher precision requirements. 

Since our proposed algorithms are optimal, their residual 
codeword error probability is guaranteed to be superior com- 
pared to errors-only decoding and single-trial error/erasure 
decoding with an optimal fixed erasing strategy. It would be 
interesting to have an upper bound which proves the gain 
of adaptive erasing over errors-only and fixed single-trial 
error/erasure decoding. This bound is in focus of our current 
investigations. 

Our work on the subject is continued with a generalization 
to multiple decoding trials and non-binary channels. This 
will enable our algorithms to be applied in existing coding 
standards which are based on serially concatenated coding 
schemes. 
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